Some Training Subset Selection Methods for Supervised Learning in Genetic Programming
نویسندگان
چکیده
When using the Genetic Programming (GP) Algorithm on a diicult problem with a large set of training cases, a large population size is needed and a very large number of function-tree evaluations must be carried out. This paper describes how to reduce the number of such evaluations by selecting a small subset of the training data set on which to actually carry out the GP algorithm. Three subset selection methods described in the paper are: Dynamic Subset Selection (DSS), using the current GP run to select`diicult' and/or disused cases, Historical Subset Selection (HSS), using previous GP runs, Random Subset Selection (RSS). GP, GP+DSS, GP+HSS, GP+RSS are compared on a large classiication problem. GP+DSS can produce better results in less than 20% of the time taken by GP. GP+HSS can nearly match the results of GP, and, perhaps surprisingly, GP+RSS can occasionally approach the results of GP. GP and GP+DSS are then compared on a smaller problem, and a hybrid Dynamic Fitness Function (DFF), based on DSS, is proposed.
منابع مشابه
Dynamic Training Subset Selection for Supervised Learning in Genetic Programming
When using the Genetic Programming (GP) Algorithm on a dii-cult problem with a large set of training data, a large population size is needed and a very large number of function-tree evaluations must be carried out. This paper describes some eeorts made to reduce the number of such evaluations by concentrating on selecting a small subset of the training data set on which to actually carry out th...
متن کاملTowards Efficient Training on Large Datasets for Genetic Programming
Genetic programming (GP) has the potential to provide unique solutions to a wide range of supervised learning problems. The technique, however, does suffer from a widely acknowledged computational overhead. As a consequence applications of GP are often confined to datasets consisting of hundreds of training exemplars as opposed to tens of thousands of exemplars, thus limiting the widespread app...
متن کاملA Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems
Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...
متن کاملComparative Study of Attribute Selection Using Gain Ratio and Correlation Based Feature Selection
Feature subset selection is of great importance in the field of data mining. The high dimension data makes testing and training of general classification methods difficult. In the present paper two filters approaches namely Gain ratio and Correlation based feature selection have been used to illustrate the significance of feature subset selection for classifying Pima Indian diabetic database (P...
متن کاملA Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems
Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...
متن کامل